Skip to content

refactor(e2e): migrate to external oadp-must-gather container image#2007

Draft
kaovilai wants to merge 3 commits intoopenshift:oadp-devfrom
kaovilai:no-mor-mustgather
Draft

refactor(e2e): migrate to external oadp-must-gather container image#2007
kaovilai wants to merge 3 commits intoopenshift:oadp-devfrom
kaovilai:no-mor-mustgather

Conversation

@kaovilai
Copy link
Copy Markdown
Member

@kaovilai kaovilai commented Nov 4, 2025

Summary

Remove local must-gather directory and build process in favor of using the external quay.io/konveyor/oadp-must-gather:latest image via oc adm must-gather. This eliminates architecture mismatch issues and keeps must-gather code in its dedicated repository at https://github.com/openshift/oadp-must-gather.

Changes

  • Updated RunMustGather() in tests/e2e/lib/apps.go to use oc adm must-gather command
  • Added MUST_GATHER_IMAGE environment variable (defaults to quay.io/konveyor/oadp-must-gather:latest)
  • Removed build-must-gather target from Makefile
  • Removed entire must-gather/ directory (3,174 lines deleted)
  • Updated documentation in docs/developer/testing/TESTING.md

Benefits

  • ✅ No more architecture mismatch issues between local build and cluster
  • ✅ Must-gather code maintained in dedicated repository
  • ✅ Simpler build process (no local compilation needed)
  • ✅ Easy to override image for version-specific testing via MUST_GATHER_IMAGE env var
  • SKIP_MUST_GATHER flag preserved for skipping collection

Usage

Default behavior (uses latest):

make test-e2e

With custom image:

MUST_GATHER_IMAGE=quay.io/konveyor/oadp-must-gather:oadp-1.4 make test-e2e

Skip must-gather:

SKIP_MUST_GATHER=true make test-e2e

Fixes #2005

Testing

  • Code compiles successfully
  • No remaining references to local must-gather binary
  • Documentation updated

Note

Responses generated with Claude

Summary by CodeRabbit

  • Refactor

    • Must-gather now uses an image-built workflow instead of local compilation during test setup.
  • New Features

    • Introduced MUST_GATHER_IMAGE, MUST_GATHER_REPO, and MUST_GATHER_BRANCH to control image source; conditional image build/push when a repo is provided.
  • Documentation

    • Updated E2E testing docs to describe new variables and SKIP_MUST_GATHER semantics.
  • Chores

    • Removed legacy must-gather source, README, and deprecated wrapper scripts.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 4, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Removes the local must-gather implementation (sources, build, module, and docs), deletes deprecated must-gather wrapper scripts, and updates Makefile and E2E tests to optionally build/push a must-gather container image from an external repo or run must-gather via a configurable image (MUST_GATHER_IMAGE). Tests discover the summary via a glob under the must-gather artifacts.

Changes

Cohort / File(s) Summary
Makefile & build
Makefile
Reworked must-gather build: added MUST_GATHER_REPO/MUST_GATHER_BRANCH vars, new build-must-gather target that clones a repo, builds/pushes an image via $(CONTAINER_TOOL), and conditionally runs during test-e2e when MUST_GATHER_REPO is set. Removed prior Go binary build and SKIP_MUST_GATHER gating; test-e2e-setup no longer depends on build-must-gather.
E2E tests
tests/e2e/lib/apps.go
Replaced local must-gather binary usage with oc adm must-gather --image=${MUST_GATHER_IMAGE} --dest-dir=...; captures combined stdout/stderr on failure, truncates clusterID to 8 chars, and locates oadp-must-gather-summary.md via glob under artifact clusters directory.
Docs
docs/developer/testing/TESTING.md
Clarified SKIP_MUST_GATHER semantics and added MUST_GATHER_IMAGE, MUST_GATHER_REPO, and MUST_GATHER_BRANCH entries and behavior (including default image and auto-setting when repo provided).
Removed must-gather source & build
must-gather/Dockerfile, must-gather/go.mod, must-gather/cmd/main.go
Deleted multi-stage Dockerfile, Go module file, and CLI main entrypoint — removed local build artifacts and module metadata.
Removed must-gather packages
must-gather/pkg/cli.go, must-gather/pkg/gather/gather.go, must-gather/pkg/gvk/gvk.go, must-gather/pkg/templates/summary.go
Deleted CLI implementation, gather helper, GVK declarations, and templating/summary generation logic (many exported functions/variables removed).
Docs & deprecated scripts
must-gather/README.md, must-gather/deprecated/*
Removed must-gather README and all deprecated wrapper scripts under must-gather/deprecated/.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test Runner
  participant OC as oc CLI
  participant Reg as Image Registry
  participant K8s as Cluster API
  participant FS as Filesystem

  Test->>OC: run "oc adm must-gather --image=${MUST_GATHER_IMAGE} --dest-dir=/tmp/mg"
  OC->>Reg: pull ${MUST_GATHER_IMAGE}
  OC->>K8s: collect cluster resources/plugins
  K8s-->>OC: resource responses
  OC->>FS: write artifacts to /tmp/mg/clusters/<clusterID>/
  Test->>FS: glob for oadp-must-gather-summary.md and validate contents
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning The RunMustGather function executes without timeout protection and test assertions lack meaningful failure messages. Add context-based timeout (30 minutes) to exec.Command and include error messages in all Expect assertions.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main change: migrating to an external oadp-must-gather container image, which aligns with the primary objective of removing the local must-gather directory.
Description check ✅ Passed The PR description is well-structured with clear sections (Summary, Changes, Benefits, Usage, Testing) that exceed the template requirements and thoroughly document the changes made.
Linked Issues check ✅ Passed The PR substantially addresses issue #2005 by removing the local must-gather code and switching to the external image via oc adm must-gather, enabling tests to use the dedicated repository instead of local builds.
Out of Scope Changes check ✅ Passed All changes are in-scope: Makefile refactoring to use external images, documentation updates, test code changes to invoke oc adm must-gather, and removal of local must-gather code. No unrelated modifications detected.
Stable And Deterministic Test Names ✅ Passed All test names use static, descriptive strings without dynamic content like pod names, timestamps, UUIDs, or node identifiers.
Microshift Test Compatibility ✅ Passed PR does not add any new Ginkgo e2e tests; changes only affect must-gather implementation and RunMustGather() utility function.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e test cases; only a utility function helper is updated.
Topology-Aware Scheduling Compatibility ✅ Passed PR does not modify deployment manifests, operator controllers, or scheduling constraints. Changes are limited to build/test infrastructure only.
Ote Binary Stdout Contract ✅ Passed The PR does not introduce process-level stdout writes that violate the OTE Binary Stdout Contract. RunMustGather() is a test helper function using log.Printf() which writes to stderr by default, not stdout.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e test cases, only modifies existing helper functions without IPv4-specific assumptions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Nov 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/lib/apps.go (1)

399-410: Add bounds checking before slicing ClusterID.

Line 409 will panic if clusterVersion.Spec.ClusterID is shorter than 8 characters. While cluster IDs are typically UUIDs (36 characters), defensive programming requires bounds checking.

Apply this diff to add bounds checking:

 	clusterVersion := &clusterVersionList.Items[0]
-	clusterID := string(clusterVersion.Spec.ClusterID[:8])
+	clusterIDStr := string(clusterVersion.Spec.ClusterID)
+	if len(clusterIDStr) < 8 {
+		return fmt.Errorf("cluster ID is too short: %s", clusterIDStr)
+	}
+	clusterID := clusterIDStr[:8]
🧹 Nitpick comments (1)
tests/e2e/lib/apps.go (1)

422-432: Consider logging which summary file is being read.

When multiple matches exist (e.g., from previous test runs), line 423 silently uses the first match. Adding a log statement would help with debugging.

 	// Read and validate the summary
+	log.Printf("Reading must-gather summary from: %s", matches[0])
 	mustGatherSummaryContent, err := os.ReadFile(matches[0])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 513180a and f352b31.

⛔ Files ignored due to path filters (1)
  • must-gather/go.sum is excluded by !**/*.sum
📒 Files selected for processing (24)
  • Makefile (1 hunks)
  • docs/developer/testing/TESTING.md (1 hunks)
  • must-gather/Dockerfile (0 hunks)
  • must-gather/README.md (0 hunks)
  • must-gather/cmd/main.go (0 hunks)
  • must-gather/deprecated/gather_1h (0 hunks)
  • must-gather/deprecated/gather_1h_essential (0 hunks)
  • must-gather/deprecated/gather_24h (0 hunks)
  • must-gather/deprecated/gather_24h_essential (0 hunks)
  • must-gather/deprecated/gather_6h (0 hunks)
  • must-gather/deprecated/gather_6h_essential (0 hunks)
  • must-gather/deprecated/gather_72h (0 hunks)
  • must-gather/deprecated/gather_72h_essential (0 hunks)
  • must-gather/deprecated/gather_all (0 hunks)
  • must-gather/deprecated/gather_all_essential (0 hunks)
  • must-gather/deprecated/gather_metrics_dump (0 hunks)
  • must-gather/deprecated/gather_with_timeout (0 hunks)
  • must-gather/deprecated/gather_without_tls (0 hunks)
  • must-gather/go.mod (0 hunks)
  • must-gather/pkg/cli.go (0 hunks)
  • must-gather/pkg/gather/gather.go (0 hunks)
  • must-gather/pkg/gvk/gvk.go (0 hunks)
  • must-gather/pkg/templates/summary.go (0 hunks)
  • tests/e2e/lib/apps.go (2 hunks)
💤 Files with no reviewable changes (21)
  • must-gather/deprecated/gather_metrics_dump
  • must-gather/deprecated/gather_1h
  • must-gather/deprecated/gather_72h
  • must-gather/go.mod
  • must-gather/deprecated/gather_72h_essential
  • must-gather/deprecated/gather_24h_essential
  • must-gather/deprecated/gather_all_essential
  • must-gather/deprecated/gather_with_timeout
  • must-gather/pkg/gather/gather.go
  • must-gather/cmd/main.go
  • must-gather/README.md
  • must-gather/deprecated/gather_without_tls
  • must-gather/deprecated/gather_6h
  • must-gather/deprecated/gather_1h_essential
  • must-gather/pkg/cli.go
  • must-gather/deprecated/gather_all
  • must-gather/deprecated/gather_6h_essential
  • must-gather/Dockerfile
  • must-gather/pkg/gvk/gvk.go
  • must-gather/pkg/templates/summary.go
  • must-gather/deprecated/gather_24h
🔇 Additional comments (4)
Makefile (1)

792-792: LGTM! Dependency simplified correctly.

The removal of the build-must-gather dependency aligns with the PR objective to use an external container image instead of building locally.

docs/developer/testing/TESTING.md (1)

28-29: LGTM! Documentation clearly describes the new environment variables.

The documentation accurately reflects the new must-gather behavior with environment-driven image selection.

tests/e2e/lib/apps.go (2)

381-387: LGTM! Environment variable handling is correct.

The implementation properly reads the MUST_GATHER_IMAGE environment variable with a sensible default value, and includes helpful comments about version-specific usage.


389-397: LGTM! Command execution and error handling look good.

The oc adm must-gather command is constructed correctly with appropriate flags, and errors are captured with output for debugging.

Comment thread tests/e2e/lib/apps.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/e2e/lib/apps.go (1)

411-425: Consider using a wildcard pattern for better robustness.

The current pattern derivation logic (lines 415-418) attempts to mirror how oc adm must-gather names directories, but this could fail if the actual naming logic differs or if edge cases exist (e.g., registries with multiple dots, special characters in tags). The past review suggested using a simpler wildcard pattern, which would be more resilient.

Apply this diff to use a wildcard pattern and add debugging information:

 	// Find the must-gather output directory
-	// oc adm must-gather creates a directory based on the image name with registry separators
-	// replaced by hyphens. E.g., quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-*
-	// We need to derive the pattern from the actual image being used
-	imagePattern := strings.ReplaceAll(mustGatherImage, ":", "-")
-	imagePattern = strings.ReplaceAll(imagePattern, "/", "-")
-	imagePattern = strings.ReplaceAll(imagePattern, ".", "-")
-	pattern := filepath.Join(artifact_dir, imagePattern+"-*", "clusters", clusterID, "oadp-must-gather-summary.md")
+	// oc adm must-gather creates: <artifact_dir>/<image-based-dir>/clusters/<cluster-id>/
+	// Use a wildcard pattern to match any must-gather output directory
+	pattern := filepath.Join(artifact_dir, "*", "clusters", clusterID, "oadp-must-gather-summary.md")
 	matches, err := filepath.Glob(pattern)
 	if err != nil {
 		return fmt.Errorf("error finding must-gather summary: %w", err)
 	}
 	if len(matches) == 0 {
-		return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
+		// List what directories exist to help debug
+		dirs, _ := filepath.Glob(filepath.Join(artifact_dir, "*"))
+		return fmt.Errorf("no must-gather summary found at pattern: %s\nDirectories in artifact_dir: %v", pattern, dirs)
 	}

This approach is simpler, more robust, and provides better debugging information when the pattern fails to match.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between f352b31 and ac8bcc6.

📒 Files selected for processing (1)
  • tests/e2e/lib/apps.go (2 hunks)
🔇 Additional comments (3)
tests/e2e/lib/apps.go (3)

381-387: LGTM! Clean environment variable handling.

The environment variable handling is well-implemented with a sensible default and helpful inline documentation for users who need version-specific images.


389-397: LGTM! Proper command execution and error handling.

The command execution correctly uses CombinedOutput and wraps errors with the command output for better debugging.


427-441: LGTM! Validation logic is sound.

The summary file reading and validation correctly checks for the expected success message. The TODO comment appropriately flags future enhancement opportunities.

Comment thread tests/e2e/lib/apps.go Outdated
@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented Nov 4, 2025

/retest

ai-retester: The OADP E2E test failed because the "Mongo application CSI" test timed out while waiting for the todolist pod to succeed. Consequently, the must-gather process also failed due to missing summary files.

The e2e-test-cli-aws-e2e step failed because the Mongo application CSI via CLI test timed out waiting for the todolist-6d7bb9554c-wdtkj pod to become ready, and the must-gather process failed and therefore no summary was generated. Specifically, the pod was stuck in PodInitializingstate. This suggests a problem with the application deployment or configuration in the test environment.

The Mongo application CSI test timed out after 540 seconds, and further attempts to run it also failed due to the todolist pod failing to initialize. The Must-gather stage also failed because it couldn't find a summary.

@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented Nov 4, 2025

ai-retester: The OADP E2E test failed because the "Mongo application CSI" test timed out while waiting for the todolist pod to succeed. Consequently, the must-gather process also failed due to missing summary files.

The e2e-test-cli-aws-e2e step failed because the Mongo application CSI via CLI test timed out waiting for the todolist-6d7bb9554c-wdtkj pod to become ready, and the must-gather process failed and therefore no summary was generated. Specifically, the pod was stuck in PodInitializingstate. This suggests a problem with the application deployment or configuration in the test environment.

The Mongo application CSI test timed out after 540 seconds, and further attempts to run it also failed due to the todolist pod failing to initialize. The Must-gather stage also failed because it couldn't find a summary.

Good, https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_oadp-operator/2007/pull-ci-openshift-oadp-operator-oadp-dev-4.19-e2e-test-aws/1985812568559587328/artifacts/e2e-test-aws/e2e/artifacts/Mongo%20application%20CSI/ still populated, which means this PR works.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2025
@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 513180a and 2 for PR HEAD ac8bcc6 in total

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 5048a8f and 1 for PR HEAD ac8bcc6 in total

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2025
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Nov 5, 2025

New changes are detected. LGTM label has been removed.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented Nov 5, 2025

updated after rebase

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/e2e/lib/apps.go (1)

409-409: Potential panic: validate ClusterID length before slicing.

This is the same critical issue flagged in the previous review. The slice operation [:8] will panic at runtime if ClusterID is shorter than 8 characters.

Apply this diff to safely handle shorter ClusterIDs:

-	clusterID := string(clusterVersion.Spec.ClusterID[:8])
+	clusterID := string(clusterVersion.Spec.ClusterID)
+	if len(clusterID) > 8 {
+		clusterID = clusterID[:8]
+	}
🧹 Nitpick comments (1)
tests/e2e/lib/apps.go (1)

423-425: Enhance error message with available directories for debugging.

When the pattern fails to match, listing the actual directories in artifact_dir would help diagnose whether the issue is pattern mismatch or missing must-gather output.

Apply this diff:

 	if len(matches) == 0 {
-		return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
+		// List what directories exist to help debug
+		dirs, _ := filepath.Glob(filepath.Join(artifact_dir, "*"))
+		return fmt.Errorf("no must-gather summary found at pattern: %s\nDirectories in artifact_dir: %v", pattern, dirs)
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ac8bcc6 and 2ae6d45.

⛔ Files ignored due to path filters (1)
  • must-gather/go.sum is excluded by !**/*.sum
📒 Files selected for processing (24)
  • Makefile (1 hunks)
  • docs/developer/testing/TESTING.md (1 hunks)
  • must-gather/Dockerfile (0 hunks)
  • must-gather/README.md (0 hunks)
  • must-gather/cmd/main.go (0 hunks)
  • must-gather/deprecated/gather_1h (0 hunks)
  • must-gather/deprecated/gather_1h_essential (0 hunks)
  • must-gather/deprecated/gather_24h (0 hunks)
  • must-gather/deprecated/gather_24h_essential (0 hunks)
  • must-gather/deprecated/gather_6h (0 hunks)
  • must-gather/deprecated/gather_6h_essential (0 hunks)
  • must-gather/deprecated/gather_72h (0 hunks)
  • must-gather/deprecated/gather_72h_essential (0 hunks)
  • must-gather/deprecated/gather_all (0 hunks)
  • must-gather/deprecated/gather_all_essential (0 hunks)
  • must-gather/deprecated/gather_metrics_dump (0 hunks)
  • must-gather/deprecated/gather_with_timeout (0 hunks)
  • must-gather/deprecated/gather_without_tls (0 hunks)
  • must-gather/go.mod (0 hunks)
  • must-gather/pkg/cli.go (0 hunks)
  • must-gather/pkg/gather/gather.go (0 hunks)
  • must-gather/pkg/gvk/gvk.go (0 hunks)
  • must-gather/pkg/templates/summary.go (0 hunks)
  • tests/e2e/lib/apps.go (2 hunks)
💤 Files with no reviewable changes (21)
  • must-gather/deprecated/gather_all_essential
  • must-gather/deprecated/gather_metrics_dump
  • must-gather/Dockerfile
  • must-gather/deprecated/gather_72h
  • must-gather/deprecated/gather_24h_essential
  • must-gather/deprecated/gather_1h
  • must-gather/deprecated/gather_6h
  • must-gather/deprecated/gather_1h_essential
  • must-gather/go.mod
  • must-gather/deprecated/gather_all
  • must-gather/pkg/cli.go
  • must-gather/deprecated/gather_24h
  • must-gather/deprecated/gather_without_tls
  • must-gather/deprecated/gather_with_timeout
  • must-gather/cmd/main.go
  • must-gather/pkg/gather/gather.go
  • must-gather/README.md
  • must-gather/pkg/gvk/gvk.go
  • must-gather/deprecated/gather_72h_essential
  • must-gather/deprecated/gather_6h_essential
  • must-gather/pkg/templates/summary.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/developer/testing/TESTING.md
  • Makefile
🔇 Additional comments (1)
tests/e2e/lib/apps.go (1)

381-397: LGTM: Environment-driven must-gather execution.

The environment variable setup and command execution are well-implemented:

  • Sensible default image with clear upgrade path via MUST_GATHER_IMAGE
  • Proper use of oc adm must-gather with explicit flags
  • Comprehensive error handling with command output included

Comment thread tests/e2e/lib/apps.go Outdated
@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented Nov 5, 2025

/retest

ai-retester: The end-to-end (e2e) test e2e-test-cli-aws failed because the Mongo application CSI via CLI test timed out while waiting for the todolist pod to reach the succeeded state. Additionally, the must-gather utility failed to find a summary file. Container test exited with code 2, reason Error.

Why the Mongo test blew up

The entire CI run was green until we hit the Mongo application CSI test.
That test finished the initial phases (DPA & Velero were up, the BSL was
available, and the Mongo/ todolist pods were created) and then we hit the
pausing points:

…waiting for pod to be ready…
  Pod not yet succeeded
…
container "todolist" in pod "todolist-…" is waiting to start: PodInitializing

The test had just finished installing the Mongo database and the
initial‑service container “init‑myservice”. It was the todolist container
that never progressed from PodInitializing to Running. There were no
events showing the image being pulled or started – the only event was the
init container that simply sleeps forever. The pod never found the
quay.io/migtools/oadp-ci-todolist-mongo-go:latest image, so the container
was stuck in PodInitializing and the Eventually guard timed out after
540 s.

In short, the todolist container could not start (most likely the image
was missing/incorrect), so the backup/restore cycle for Mongo never reached
the restore‑verify step, producing the single test failure.

Short failure summary

  1. Mongo CSI backup‑restore test
    The todolist pod never reached a running state.
    The test created a MongoDB pod and a T–odo‑list pod that depended on it.
    The T–odo‑list container was stuck in the PodInitializing phase for over 20 minutes, eventually timing out after 540 s.
    Parallel‑Pod event output shows the pod was repeatedly re‑started and never received the “Running” status.
    Because the test uses the Eventually assertion, the container stayed in PodInitializing and the test failed, causing the whole suite to abort.

  2. Must‑Gather for the test
    The final post‑test must‑gather step could not find the expected oadp‑must‑gather‑summary.md file, so it returned an error.

So, the single failure occurs when the Mongo T‑odo‑list pod failed to start (likely a race or resource issue), and the must‑gather step also failed to locate its summary file.

@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented Nov 7, 2025

/retest

1 similar comment
@kaovilai
Copy link
Copy Markdown
Member Author

/retest

@weshayutin
Copy link
Copy Markdown
Contributor

ugh.. AI reviews :)
So.. I'm all for having the option to run w/ the container... HOWEVER..
We need to retain the ability to run from source or a pr from the must-gather repo. This was of course much easier when must-gather lived in the operator repo.

If this not running the must-gather atm in ci? I'll look

@kaovilai
Copy link
Copy Markdown
Member Author

Feedback is to build from source of a must-gather repo clone/local-dir so that must-gather can be developed on and tested in e2e w/o having to push an image to registry.

@kaovilai kaovilai marked this pull request as draft November 11, 2025 22:35
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 14, 2026
@openshift-merge-robot
Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kaovilai and others added 2 commits April 30, 2026 18:50
Remove local must-gather directory and build process in favor of using the
external quay.io/konveyor/oadp-must-gather:latest image via oc adm must-gather.
This eliminates architecture mismatch issues and keeps must-gather code in its
dedicated repository.

Changes:
- Updated RunMustGather() in tests/e2e/lib/apps.go to use oc adm must-gather
- Added MUST_GATHER_IMAGE env var (defaults to quay.io/konveyor/oadp-must-gather:latest)
- Removed build-must-gather target from Makefile
- Removed entire must-gather/ directory (3,174 lines deleted)
- Updated documentation in TESTING.md

The SKIP_MUST_GATHER flag is preserved for skipping must-gather collection.
Version-specific images can be used by setting MUST_GATHER_IMAGE env var.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The directory pattern was hardcoded to 'quay-io-konveyor-oadp-must-gather-*'
which breaks when using custom images via MUST_GATHER_IMAGE env var.

Now dynamically derives the pattern from the actual image name by replacing
registry separators (. / :) with hyphens to match oc adm must-gather's
directory naming convention.

Examples:
- quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-latest-*
- docker.io/myuser/custom:v1 -> docker-io-myuser-custom-v1-*

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@kaovilai kaovilai force-pushed the no-mor-mustgather branch from 2ae6d45 to 4b00056 Compare April 30, 2026 22:51
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2026
@kaovilai
Copy link
Copy Markdown
Member Author

/test all

Add build-must-gather Makefile target and MUST_GATHER_REPO/MUST_GATHER_BRANCH
env vars to build must-gather from GitHub source, push to ttl.sh, and use in
e2e tests. Allows testing must-gather PRs without manually publishing images.

Also improves RunMustGather: safe clusterID slice, wildcard glob pattern,
sorted matches for newest, and debug dir listing on failure.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Makefile`:
- Around line 983-994: The help comment for the build-must-gather target is
inconsistent with the actual default; update the comment or the variable so they
match: either change the comment text "Uses MUST_GATHER_BRANCH (default: main)"
to "Uses MUST_GATHER_BRANCH (default: oadp-dev)" to reflect MUST_GATHER_BRANCH
?= oadp-dev, or change the default assignment (MUST_GATHER_BRANCH ?= oadp-dev)
to MUST_GATHER_BRANCH ?= main so the comment stays accurate; edit the Makefile
comment adjacent to the build-must-gather target and ensure MUST_GATHER_BRANCH
is the intended default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eef1a2d9-7cd1-45be-9d8f-b2494a5bf09a

📥 Commits

Reviewing files that changed from the base of the PR and between 4b00056 and 6bdd530.

📒 Files selected for processing (3)
  • Makefile
  • docs/developer/testing/TESTING.md
  • tests/e2e/lib/apps.go
✅ Files skipped from review due to trivial changes (1)
  • docs/developer/testing/TESTING.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/e2e/lib/apps.go

Comment thread Makefile
Comment on lines +983 to +994
.PHONY: build-must-gather
build-must-gather: ## Build must-gather image from GitHub source. Requires MUST_GATHER_REPO (e.g., openshift/oadp-must-gather). Uses MUST_GATHER_BRANCH (default: main).
ifeq ($(MUST_GATHER_REPO),)
$(error MUST_GATHER_REPO is required (e.g., openshift/oadp-must-gather))
endif
$(eval MUST_GATHER_TMP := $(shell mktemp -d))
git clone --depth=1 --branch $(MUST_GATHER_BRANCH) https://github.com/$(MUST_GATHER_REPO).git $(MUST_GATHER_TMP)
$(CONTAINER_TOOL) build --load -t $(MUST_GATHER_IMAGE) -f $(MUST_GATHER_TMP)/Dockerfile $(MUST_GATHER_TMP)
$(CONTAINER_TOOL) push $(MUST_GATHER_IMAGE)
rm -rf $(MUST_GATHER_TMP)
@echo "Must-gather image built and pushed: $(MUST_GATHER_IMAGE)"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Documentation inconsistency: comment says "default: main" but actual default is "oadp-dev".

The help comment on line 984 states Uses MUST_GATHER_BRANCH (default: main) but line 935 defines MUST_GATHER_BRANCH ?= oadp-dev.

📝 Proposed fix
 .PHONY: build-must-gather
-build-must-gather: ## Build must-gather image from GitHub source. Requires MUST_GATHER_REPO (e.g., openshift/oadp-must-gather). Uses MUST_GATHER_BRANCH (default: main).
+build-must-gather: ## Build must-gather image from GitHub source. Requires MUST_GATHER_REPO (e.g., openshift/oadp-must-gather). Uses MUST_GATHER_BRANCH (default: oadp-dev).
 ifeq ($(MUST_GATHER_REPO),)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 983 - 994, The help comment for the build-must-gather
target is inconsistent with the actual default; update the comment or the
variable so they match: either change the comment text "Uses MUST_GATHER_BRANCH
(default: main)" to "Uses MUST_GATHER_BRANCH (default: oadp-dev)" to reflect
MUST_GATHER_BRANCH ?= oadp-dev, or change the default assignment
(MUST_GATHER_BRANCH ?= oadp-dev) to MUST_GATHER_BRANCH ?= main so the comment
stays accurate; edit the Makefile comment adjacent to the build-must-gather
target and ensure MUST_GATHER_BRANCH is the intended default.

@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented May 1, 2026

/test tide

@kaovilai
Copy link
Copy Markdown
Member Author

kaovilai commented May 1, 2026

/test all

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 1, 2026

@kaovilai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-cli-aws 2ae6d45 link true /test 4.20-e2e-test-cli-aws
ci/prow/4.21-e2e-test-hcp-aws 2ae6d45 link true /test 4.21-e2e-test-hcp-aws
ci/prow/4.21-images 2ae6d45 link true /test 4.21-images
ci/prow/4.21-e2e-test-kubevirt-aws 2ae6d45 link true /test 4.21-e2e-test-kubevirt-aws
ci/prow/4.21-ci-index 2ae6d45 link true /test 4.21-ci-index
ci/prow/4.21-e2e-test-aws 2ae6d45 link true /test 4.21-e2e-test-aws
ci/prow/4.23-e2e-test-aws 6bdd530 link false /test 4.23-e2e-test-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use must-gather from new repo

5 participants